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Abstract. On the basis of the general form for the energy needed to adapt the connection 
strengths Wij of a network in which learning takes place, a local learning rule is found for the 
changes Awij. This biologically realizable learning rule turns out to comply with Hebb's neuro- 
physiological postulate, but is not of the form of any of the learning rules proposed in the literature. 

The learning rule possesses the property that the energy needed in each learning step is 
minimal, and is, as such, evolutionary attractive. Moreover, the pre- and post-synaptic neurons 
are found to influence the synaptic changes differently, resulting in a asymmetric connection matrix 
Wij , a fact which is in agreement with biological observation. 

It is shown that, if a finite set of the same patterns is presented over and over again to the 
network, the weights of the synapses converge to finite values. 

Furthermore, it is proved that the final values found in this biologically realizable limit are 
the same as those found via a mathematical approach to the problem of finding the weights of 
a partially connected neural network that can store a collection of patterns. The mathematical 
solution is obtained via a modified version of the so-called method of the pseudo-inverse, and has 
the inverse of a reduced correlation matrix, rather than the usual correlation matrix, as its basic 
ingredient. Thus, a biological network might realize the final results of the mathematician by the 
energetically economic rule for the adaption of the synapses found in this article. 

Keywords: neural network, Hebb rule, local learning rule, reduced correlation matrix, modified 
pseudo- inverse 



PACS numbers: 84.35+i, 87.10+e 
1. Introduction 

In this article we consider some theoretical aspects of the changes of the connections as 
they could take place between the nerve cells, or neurons, of the brain. In a learning 
process, these connections change continuously, and are adapted in such a way that a 
particular task, e.g., the storage of patterns, is achieved. The answer to the question 
in which way the connections between neurons actually change, in response to external 
stimuli, can only be given by experiment, not via any theoretical discussion. Although 
there is a lot of experimental activity related to the study of functioning of neurons, 
there is not yet a unique answer to this question: see e.g., the 1998 review articles of 
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Buonomano and Merzenich [|TJ or Marder |2j, or the 1990 review article of Brown et al. 

In the forties, the Canadian psychologist Hebb conjectured in his now famous book 
The organization of behavior — A neuro-physiological Theory M that the changes of 
the connections between the neurons take place according to a 'neuro-physiological 
postulate' that nowadays is referred to as Hebb's rule: 'When an axon of cell A is 
near enough to excite a cell B and repeatedly or persistently takes part in firing it, 
some growth process or metabolic change takes place in one or both cells so that A's 
efficiency, as one of the cells firing B, is increased'. Thus Hebb's rule is a quantitative 
statement on the enhancement of synaptic efficiency of signal transmission, but does 
not state qualitatively, by some mathematical formula, to what extend. 

Nowadays, there is a great amount of evidence that synapses do indeed change 
in a learning process, and, since the appearance of Hebb's article many quantitative 
proposals, all complying with Hebb's postulate, have been put forward. Also the 
present article is concerned with such a quantitative expression for the synaptic changes. 
However, rather than postulating a learning rule, we derive it from some underlying 
principle. As a final result, we find a learning rule for the adaptation of the strengths, 
or weights, Wy, of a synapse connecting a post-synaptic neuron % and a pre-synaptic 
neuron j. Its explicit form reads: 

A Wij (t n ) = th[k - {hi{t n ) - 0i}(26 - 1)](2£ - 1)0 (1) 

This — asymmetric — learning rule gives Auv,-, the positive or negative increment of the 
weight Wij, as a function of the activities and £j of neurons i and j of the synapse 
that connects these neurons. In our convention, the activity £ of a neuron equals 1 if it 
generates an action potential, and if it is quiescent. The function h is the potential 
difference between the interior and the exterior of a neuron, at its axon hillock. The 
formula gives the change at time t n . The index n denotes the time at the n-th learning 
step in the process of learning (n = 1,2,...). The threshold potential, 9i, is a constant, 
typical for the neuron % in question. It equals, by definition, the potential that must be 
surmounted, at the axon hillock of neuron i, in order that it will fire. The quantities rji 
and k are also constants. Their precise identification, as variables related to individual 
and collective neuron properties, is outside the scope of the present article. The learning 
rule (P, which constitutes our main result as far as biology is concerned, has a form 
that is compatible with Hebb's postulate. 

It is a well-known fact that, for a given neural net with strengths of the weights, 
there are infinitely many ways to choose changes AtOy of the weights such that the 
network will perform storage and retrieval of a new pattern. The derivation of our 
learning rule is based on the assumption that, at each instant of the learning process, 
the energy needed to change the neural network in order to store a new pattern, is 
minimal. The requirement that, at each step n of the learning process, the energy 
energy needed is as low as possible, turns out to be sufficient to uniquely determine 
the way in which the weight of each synapse connecting two arbitrary neurons i and j 
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should be changed, and thus fixes a learning rule for the adaptation of the weights of 
all the connections. We will call this learning rule, the 'non-local energy saving learning 
rule', since it turns out to depend on the state of activity of all neurons j from which 
neuron % receives its input. It is given by eq. (|^) below. 

It is impossible, however, for a synapse connecting two neurons i and j, to realize 
the non-local energy saving learning rule (^2J) exactly, as follows by a careful inspection 
of formula (|42|) . In fact, in order to adapt itself according to this learning rule, a 
synapse between i and j would have to 'know' the individual states of activity of all 
pre-synaptic neurons k from which neuron i gets its input, whereas a synapse only 'feels' 
the states of the two neurons i and j which it connects. The best a synapse can do in 
order to compete with the performance of the non-local learning rule (^) is to adapt 
itself according to a learning rule that is a local approximation of the non-local learning 
rule. It is this local approximation, given the expression ([!]) above, which constitutes 
our main biological result. We will refer to it as the local energy saving learning rule, 
to distinguish it from its non-local counterpart. The point of locality of learning rules 
is discussed in more detail in section ^|. 

A numerical estimation of the performance of the local learning rule, eq. ([!]), versus 
to the non-local one, eq. (|42|), is made in section [?[ Local learning turns out to be a 
very effective alternative for non-local learning, as well as regarding its power to store 
and retrieve patterns as with respect to its capacity to be economic in use of energy. 

In order to arrive at the non-local energy saving learning rule, we think of a neuron 
as a living cell. A living cell, as a physical object, is a stationary non-equilibrium 
system. The basic assumption of this article is that any type of change of the cellular 
cleft can only be effected by adding energy to the non-equilibrium system, independent 
of the fact whether it leads to a strengthening or a weakening of the synaptic efficacy. 
This is a plausible, but not totally trivial postulate, which can only be falsified by a 
detailed biophysical or biochemical study of the process of change of the synapse. In our 
setup, the mere assumption that extra energy is needed for any change of the synapse, 
independent of the fact whether it leads to an increase or a decrease of its efficiency, 
replaces Hebb's postulate on efficiency cited above. 

Before starting the derivation of the energy saving learning rule itself, we discuss, 
in section the 81 possibilities which, in principle, are compatible with Hebb's 
postulate. In particular, we consider these mathematical realizations with respect to 
there biological plausibility. We then find that, in fact, out of the 81 learning rules that 
are possible in principle, only two are also biologically plausible. These are the learning 
rules (g(]) and 

The actual derivation of the energy saving learning rule is performed in section f|. 
To our satisfaction, it general form turns out to imply the two forms (P0| ) and (PT|) 
expected in the preceeding section on biological grounds only. Thus our 'principle of 
minimal change of energy', which might lead, a priori, to any of the 81 possibilities for 
a realization of a learning rule for the change of weight of a synapse, happens to yield 
precisely those rules which are biological plausible. 
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In section |5] we consider the situation that the changes of the connections do not 
take place in an energetically optimal way, but in such a way that patterns are not 
partially wiped out when new patterns are learned as is the case for learning based on 
the energy saving learning rule ([I]) or (53). We then ask ourselves the question which 
learning rule would then be found for the changes Awij of the synaptic weights. Again, 
its general form turns out to comply with one of the 81 possible realizations of the Hebb 
rule considered in section || but, in this case, it is an biologically improbable one. We 
therefore do not pursue this path any further. 

The question might arise whether the non-local energy saving learning rule 
converges, in the limit that the number of learning steps tends to infinity. And, if 
so, to what values they then would converge. The answers to these questions are the 
subject of section [Q| . 

There exists a well-know way to obtain the final form of the connection strengths 
W{j of an artificial neural network that can store and retrieve a set of patterns: it 
goes under the name 'pseudo inverse solution' || |J. By inversion of a certain matrix 
related to the patterns to be stored, the so-called correlation matrix, one can obtain, 
without any limiting procedure, final values for the weights of the connections of a 
neural network that yield the desired result of being capable of storing and retrieving a 
collection of patterns. 

We will consider an assembly of iV neurons, where N is a number relevant for a 
certain subunit of the brain, such as a cortical hyper-column, for which N is of the order 
of 10 4 to 10 5 . Although such subunits are highly interconnected, they are partially 
connected in the mathematical sense, since each neuron is connected to only a finite 
fraction of the subunit considered. Moreover, biological neurons are not self-connected, 
i.e., Wa = 0. These two biological facts force us to study, from the very beginning, 
diluted, or partially connected, networks. In the limit that the dilution tends to zero, 
we rediscover, if we relax the requirement that the self- connect ions all vanish, some of 
the well-known results for fully connected networks, in particular those of Diederich and 
Opper , and of Linkevich || . 

A possible question one might now ask is: is there any relation between the final 
values obtained for the weights Wij obtained in the limit of an infinite number of learning 
steps, n — > oo, at the one hand and the values obtained via the pseudo-inverse method 
at the other hand? The answer to this question is as simple as it is amazing: the results 
are identical. The proof of this point is the subject of appendix B, where the method 
of pseudo-inverse is modified in such a way that it can be used for partially connected 
networks. Thus, as a final conclusion, we can state that i. the assumption of economy of 
energy in a learning step, ii. the well-known method based on the pseudo-inverse of the 
correlation matrix and Hi. biological plausibility of a learning rule are three members of 
a trio that work in concert. We want to stress, once again, that the question whether the 
evolutionary development of the brain actually has led to an adaptation process of the 
synapses that is energetically the most economical, is, as yet, experimentally, an open 
question. It is not excluded that the realization of the changes of the synapses might 
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take place in a biologically less probable, or an energetically less favourable way. Our 
only certainty is that economy of energy and biological probability go hand in hand. 

Usually, neural networks have been modeled in the so-called spin-representation, 
which, in principle, can easily be translated to the so-called binary representation, which 
models the biological reality more directly. In particular, in the binary representation 
the thresholds for activation of a neuron can be taken constant, in accordance with the 
biological reality. In the spin-representation, however, the actual biological reality in 
a learning process can only be modeled via the use of a time- dependent threshold, a 
fact which is often overlooked: one erroneously treats the neuron thresholds in the spin 
model as constants, see, e.g., || ITOj . We therefore have chosen not to use the spin, but 
the binary representation. 

In our study of the connections Wij and the way in which they change in a learning 
process, we will neglect two constraints set by nature. Firstly, the fact that, for an 
actual neuron i, the magnitudes of the synaptic connections are within some interval 
characteristic for the synapse in question. Secondly, the fact that, according to Dale's 
law, the connections related to one and the same pre-synaptic neuron either are only 
excitatory or only inhibitory. Furthermore, we treat biological neurons as McCulloch 
and Pitts neurons, i.e., their response to input is according to the rule (@)-(0) below. 
We thus also neglect the retardation which results from the finite speed of transmission 
of signals through axons and dendrites. A way retardation could be included in a model 



has been put forward in [|TT|. 



For an introduction to this article, see textbooks such as \T2, |T^, 
2. Attractor neural network model 

Dynamics We consider a network of N interconnected neurons in the binary 
representation, i.e., each neuron can have a state Xi = 1 (the neuron produces one action- 
potential or spike) or Xi = (the neuron is quiescent). The post-synaptic potential of 
neuron % at time t of this system of neurons is modeled by 

N 

hi{t)='£w ij {t)x j {t), (z= 1,...,N), (2) 
j=i 

where the Xj(t) are the input signals at time t and where the Wij(t) are the weights, 
also called synaptic strengths or synaptic efficacies at time t. A weight takes into 
account the overall effect of a synaptic connection between a post-synaptic neuron i and 
a pre-synaptic neuron j and may be positive (excitation), negative (inhibition) or zero 
(no synaptic connection). The weights w^j, like the potentials hi, are expressed in Volts. 
The output of neuron i is supposed to be given by the dynamical equation 

Xi {t + At) = 6 H {hi{t) - 6i} , (i = l,...,N), (3) 

where the constant is the activation threshold characteristic of neuron i and where At 



is some discrete time step. A typical value for 6i is 10 mV [fig] . The symbol 9h stands 
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for the Heaviside step function, which equals one for positive arguments and vanishes 
otherwise. 

In the so-called 'spin- representation', active and non- active states of neuron % are 
characterized by Sj = 1 or Sj = —1, respectively. In this representation, the dynamical 
equation (|3|) can be rewritten as 

N 

Si (t + At) = sgn{£ Mt) Sj (t) - Ti(t)} , (i = 1, . . . , N) , (4) 

i=i 

where the time dependent 'coupling constants' Jij are related to the biological weights 
Wij through = Wij/2 and where Sj = 2xj — 1. The time dependent 'thresholds' T^t) 
are related to the constant biological thresholds Q\ according to 

N 

T i (t) = e i -Y,Ja{t), (i = i,...,A0- (s) 

In the literature the thresholds Tj(t) are usually treated as a constant; most often the 
constant is taken to vanish [L0[| . This seemingly innocent fact changes, of course, 
the dynamics (|4]) of the system in a non-trivial way. As a consequence, the results 
obtained for, e.g., the adaptation of the coupling constants differ from those obtained 
when the actual biological dynamics (|3]) is used [cf. eqs. (pHj) and (|45|)1. Hence, when 
modeling adaptation processes of biological neurons with constant thresholds, the use 
of the binary representation is obligatory. 

Neural networks have two time scales, one related to the rate of change of the 
synaptic efficacies Wij and one related to the spiking activity of a neuron. The latter 
time is of the order of milliseconds, the former is less well defined, but can be estimated 
to lie somewhere between seconds and days: it is a time related to the rate of learning of 
a brain. Hence, the At occurring in equation (||]) is of the order of milliseconds. When 
the process of adaptation of the weights has come to an end the Wij remain constant. 



Fixed points We want to determine the synaptic efficacies of an attractor neural 
network, i.e., of a network which can recall a number, p say, of previously stored patterns. 
The realization of a recall corresponds to a fixed network state of the network dynamics 
(H). Let us denote the patterns of activity, or patterns, by £ M = (£f , • • • , where 
H = 1, . . . ,p. Thus £f = 1 or £f = with i = 1, . . . , N and \i = 1, . . . ,p. The 
probability that a neuron i is in the state 1 or is supposed to be given by a or (1 — a) 
respectively. The quantity a is usually called the mean activity of the neural net. For 
random patterns the mean activity a is given by 0.5. In biological neural networks, 
however, the mean activity a is smaller [[RJ . 

Thus, a network which has stored, somehow, p patterns ^ satisfies the fixed point 
equations 

Xi (t + At) = Xi (t) , for Xi(t)=g, {i = l,...,iV;/i = (6) 
Hence, equations (^) and (Bj) yield the pN equations 

N 

= ME -*,} (?) 
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for N 2 unknown m^'s. 



Let us now introduce so-called stability coefficients % [ |H]j : 

tf(t) := (tf(f) - t ) (2€T - 1) , (8) 
with /if the post-synaptic potential 



N 



W) = E^W- ( 9 ) 

i=i 

Remark that 7 f depends, via /if, on all weights t%, i.e., 7f(t) = 
7f (tOn(t), Wl2(t), ■ ■ ■ , w N -i iN (t),w NN (t)). 

One easily checks, by distinguishing the cases £f = f and £f = 0, that an equivalent 
way to express the equalities (|7|) are the pN inequalities 

7 f(t)>0. (fO) 

The inequality sign in ( |I0| ) reflects that fact that the set of equations (0) is under- 
determined, i.e., the eqs. (pi]) are necessary but not sufficient equations to determine 
uniquely a set of weights of a network which has stored some patterns. 

An arbitrary pattern X(t) will only be recalled if it evolves in time to one of the 
fixed points Therefore, it is not sufficient for a network to have fixed points: for each 
of the p fixed points that is related to a retrieval of a pattern there must exist a whole 
neighborhood of points around £ M which is such that all points of this neighborhood will 
evolve to £ M under the dynamics (^). In technical terms, the fixed points £ M must have 
a non-zero basin of attraction. For this reason, one may introduce f?], 10, 18 1 a positive 
threshold k, and demand the stronger inequalities 

7f W > « (11) 

to hold, rather than the inequalities (|10T) , which are equivalent to the fixed point 
equations (0). The larger the threshold k, the larger the basins of attractions can 
be expected to be JlO], [18] . 



In order to solve the equations (|TTD for the unknown weights W{j, we consider it as 
far as its equality sign is concerned. Then (|TT|) can be recast in the equivalent form 

N 

Y, w ii(t)$ -0i = < 2 Zi - 1) > (i = 1, • • • , = 1, • • • ,p) , (12) 

3=1 

as may be checked by putting £f equal to 1 or 0. The pN equations fll2|) do not fix 
uniquely the A^ 2 weights Wij as long as p < N, the case we consider throughout this 
article. The storage capacity a, defined as a := p/N, of a neural network is maximally 
equal to one for networks described by eqs. (jT2p . 



Various types of networks It is our aim to take into account specific aspects of the 
connectivity of a biological network. In a biological neural network a neuron does not 
excite or inhibit itself, i.e., for all t we have for the self-interactions (or self-connections) 

Wii(t)=0, (i = l,...,JV). (13) 
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Moreover, a biological network will, in general, be partially connected: each neuron will 
have some neighbourhood outside which there are no connections, i.e., 

Wij{t) = 0, (14) 

for a given set of neuron pairs We shall call a network in which a (finite) fraction 

of the weights vanish, a diluted network. Let M be the number of pairs (i,j) for which 
Wij(t) = 0. Then the dilution d of a network of N neurons is defined as 

d := M /N 2 . (15) 

Hence, the dilution d is a number between and 1. 

Let us slightly generalize the above by distinguishing in a learning process changing 
and non-changing connections Wij(t) instead of changing and vanishing connections. Let 
us consider, for a moment, one particular neuron i. Then one may define the index sets 

Vi := {j | w i:j (t) + Wij{t Q )} , Vf := {j | Wijit) = Wij(t )} ■ (16) 

Thus Vi contains the indices related to all connections of neuron i that, in a learning 
process, change in time, whereas its complement, Vf, contains the indices related to 
all non-changing connections. In particular contains the index of neuron i itself 
[wu(t) = Wu(to) = 0], the indices of neurons j which have no connections with neuron i 
[wij(t) = Wij(t ) = 0], and the indices of neurons j which have connections with fixed 
strengths with neuron i [?%(£) = tfy(io) 7^ 0]. Thus, diluted networks are a subclass 
of networks with changing and non-changing connections. By specifying, via eq. (Jl6|), 
which connections are absent, the network connectivity is completely defined. For later 
use, we introduce M, the number of pairs for which Wij(t) = ?%(to) is constant, 
but not necessarily equal to zero. 

3. Learning prescriptions — Hebb rules 

In this section we will consider all mathematical realizations which are, in principle, 
compatible with Hebb's postulate. We will argue that, in our view, only two of them, 
namely fl20|) and ([21]) are biologically plausible, in contrast to the realizations ([E]) and 
( p3|) used in the literature. In order to show this, let us consider a network with changing 
and non-changing connections, in which a learning process takes place with the purpose 
to store a collection of p patterns £ M . Let the weights at time t n be given by Wij(t n ). 
After a learning step the new weights will be given in terms of the old weights by 

Wij[tn+l) ~ I , ijevn (17) 

where Awij(t n ) is the increment at time t n . A learning rule is a recipe for the change 
Awij as a function of the states of the post-synaptic neuron i and the pre-synaptic neuron 
j when a pattern (£x, . . . , £at) is presented to the network. There are four possible states 
(&> £j) th & t the post- and pre-synaptic neuron can have, namely (0, 0), (0, 1), (1, 0) and 
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(1, 1), each of which may lead to one of the three possible changes for Au^: positive, 
negative or zero. Hence, in principle there are 3 4 = 81 possible learning rules 

Atoy : Q i— > A Wij 0) • (18) 
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Table 1. The 81 possible ways in which my may change as a function of the activities of the 
post-synaptic neuron i and the pre-synaptic neuron j can be read off from the 81 columns of the 
table. Each row may have up arrows (|), down arrows (J.) or zeros, indicating a strengthening, 
a weakening or no change of a synaptic connection. The biological reason to reject a column is 
indicated by the letter a or b immediately below the column. The reasons are a: there either 
is only strengthening or weakening of the synapse, b: there is a change of the synaptic strength 
if the pre-synaptic neuron j is inactive. From the table we can read off that 78 possibilities are 
excluded for reason a and/or b. The column with only zeros is excluded for obvious reasons. The 
two possibilities for the Hebb rule which we are left with are indicated by the symbols H and A: 
the first corresponds to what is called Hebbian learning, the second to what is called anti-Hebbian 
learning. If we do not reject a possibility for reason 6, there are many more possible Hebbian 
rules. The possibility indicated by G was used by Gardner fl9|] . The one preferred by physicists 
in their modeling of neural networks, has been indicated by the symbol P. 

It is biologically improbable that connections will always grow or will always 
decrease. Therefore, we exclude learning rules for which Au>y for all four states 
(£j> are either always positive, or always negative (reason of rejection a of tabled) . 
Moreover, in our opinion, it is biologically probable that a connection between a pre- 
synaptic neuron j and a post-synaptic neuron i does not change if the neuron j does 
not contribute to the post-synaptic potential of neuron i, i.e., if £j = 0. Therefore, we 
exclude learning rules for which Awij(^i,^,j = 0) 7^ with & = 0, 1 (reason of rejection 
b of table §) . 

Excluding these improbable learning rules, we are left with no more than two 
learning rules, as may be verified by a simple inspection of table |l| One of these 
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corresponds to the assignments 

^ i — k Am. . << n 

(19) 



(0, 0) i — > Aw^ = , (0, 1) i — ► Aw^ < 



(1, 0) i— > Aw i3 = , (1, 1) i— > Amy > , 
(column H in table [I]), which can be expressed compactly by the formula 

An;, ',,(-20- 1)0- (20) 

where the e^, here and elsewhere in this article, are positive numbers. Similarly, the 
other one can be expressed by the formula 

A,,';, -<;,(2C L)&. (21) 

(column A in table [l]). 

Learning can be classified as Hebbian or anti-Hebbian. Hebbian learning is 
characterized by the fact that, if both neurons i and j are active, Am^- is positive, 
whereas for anti-Hebbian learning Awij is negative. So, the two remaining learning 



rules (|20D and (|2l|) are Hebbian and anti-Hebbian, respectively. The learning rules ( P0|) 
and (|2l|) have, to the best of our knowledge, not been used, as yet, in mathematical or 



physical studies that tried to model biological neural systems (see, e.g. |TT|, |2( 

If we allow for the possibility that Awij ^ if the pre-synaptic neuron j is inactive 
(£j = 0), there are many extra possible mappings (0), of which we mention the two 
most often encountered in the literature 

A Wij = e^(20-l) (22) 
A Wij = ^-(26 -1X26-1) (23) 



The learning rule (|22[ ) was used, e.g., by Gardner |L9| in studying the retrieval properties 
of a neural network with an asymmetric learning rule (row G in table |l]). The learning 
rule (^) is the one most often used by physicists [^, |2l|] in their modeling of neural 
networks (row P in table 0). 

Finally, let us compare the four learning rules (|20| ) -(|23| ) after one learning step of 
one pattern £. Let us suppose that a pattern £ is not yet learned at time t so that, 
in view of (0), the quantity 7i(to) is negative. In order to store a pattern, 7^ should 
be positive. Upon substitution of the Hebbian or symmetric learning rules ( p0|) or (f23|) 
into (|8|) we find 

7i(*i) = 7<(*o) + £ e O-&> (24) 

for the anti-Hebbian learning rule (|2~TD we get 

7<(*i) = 7*(*o) - £ e»i^ ' (25) 
whereas for the asymmetric learning rule (|22"D we obtain 

7i(*i) = 7(*o)+6E%^' (26) 

J6V< 

where to is the initial time and ti is the time after one learning step. By a suitable 
choice for it can always be achieved that 7i(ti) is positive in case of the Hebbian and 
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symmetric learning rules ( pO| ) and (|23|), whatever are the values of £j and as follows 
from (pl[). This can never be achieved in case of the anti-Hebbian learning rule fl2~T|), as 
is seen from Finally, in case & = 0, this can never be achieved for the asymmetric 
learning rule fl22|), as can be read off from (f26|). These simple arguments show that 
the Hebbian and symmetric learning rules ( p0|) and ( p3|) — but not the anti-Hebbian 
and asymmetric learning rules (^TJ) and (|2"2"| ) — are, in principle, suitable for storage of 
patterns. 

In the next section we will show that the requirement that synaptic changes take 
place in an energetically economic way leads to a learning rule which, depending on the 
state of the post-synaptic neuron i, is of the Hebbian or anti-Hebbian form (|2~0"D or (|2ll) . 
Hence, the naive approach of this section, which leads to the two forms (|20|) and (|2l|) , 
is consistent with an approach which is based on a physical principle. 



4. Energy saving learning rule 

In the literature, Hebb rules for the change of the synaptic connections have been derived 
in various manners, many of which essentially correspond to the determination of an 
extremum of some 'Lyapunov' or 'cost function', also called 'energy function' 

1 N 

#(*) = -« E Mt)*i(t)*i(t)- ( 27 ) 



If Jij = Jji, eq. (|27|) is the central equation of the Hopfield model pi] . In case of an 



Ising system of atoms with spins, an equation of the form fl27|) corresponds to the actual 
physical energy of the spin-system. 

For a system of neurons, however, an energy function of the form fl2~T|) is an ad-hoc 
postulate. It is not derived from or suggested by some underlying biological, biochemical 
or biophysical principle. In other words, the function (^), is, a priori, totally unrelated 
to the actual energy of the neural system. Consequently, a 'derivation' leading to a 
Hebb rule based on a function of the type (P7|), (see, e.g., |T3]), is just as ad hoc as the 
postulate underlying it. 

In this section we will show that the Hebb rule (^TJ) and its anti-Hebbian counterpart 



(pT|) can be found by postulating that the (biochemical) energy needed to change the 
synapses — in order to store a new pattern £ — is minimal. We thus show that these 
particular Hebb rules — and only these ones — are consistent with a physical principle. 
The argument runs as follows. 

The energy AEij to change the connection Wij(t n ) to Wij(t n+ i) will be a 
differentiate function of the magnitude of the change Awij(t n ) occurring in ( |17D 

= fij(AWij) . (28) 

If a synapse between the neurons i and j is not changed in a learning step there is no 
energy consumed. Hence, the energy change AE^ vanishes if Awij = 0, i.e., 

/«(0) = 0. (29) 



Derivation of Hebb 's rule 



12 



Moreover, we assume that a change of a synapse, whether it be a strengthening or a 
weakening, can only be achieved by adding energy to the system. Thus, if Aw^ ^ 0, 
we put, 

f ij (Aw ij )>0. (30) 

The equations fl2T?P and fl3Ti| ) enable us to obtain a useful approximate expression for the 
energy change AEy. We first note that any differential function f(x) can be written as 
a power series f(x) = + c^x + c^x 2 + . . . . Thus, we have for the function (|28|) , 
up to terms quadratic in Awij, 

fij(Awij) = 4 0) + c^Aw l3 + c?)Aw% , (31) 

where, in view of (|^) and (|30"D the coefficients have the properties 



c!? = 0, cg>=0, cg)>0 (32) 

Furthermore, we take 

c\) = Ci , (33) 

which is equivalent to the supposition that a change of connections related to different 
synapses j = 1, 2, . . . , iV of the same neuron i needs the same amount of energy This 
assumption simplifies some of the formulae below; it is not essential in the sense that all 
conclusions remain unaltered if the simplification (|33| ) is not used, see P3"| . The total 
change AE in the n-th learning step Wij(t n ) — > Wij(t n+ i), where in principle all Wij 
with j E Vi may change, is given by the sum of the individual changes, 

N 

AE(Aw kl ) = E E A;U«-/) > (34) 
i=i jev^ 



or, inserting (|3l|) with (^) and (|33|), by 

AT 

AE(tu fci (f n+1 )) = E E °i K(Wi) - ^ij(^n)) 2 • (35) 
i=i jeVi 

The positive constants q are characteristic of neuron i. 

Equation ( |35"D will be our starting point for the derivation of the energy saving 



learning rule ([42]). It is the general form any expression must have that describes the 
energy needed to adapt the connection strengths function of their changes 

Awij. We now will minimize the change in energy AE as a function of the new weights 
Wki(t n+ i) under the constraint ( |12"D using the Lagrange method. This was the reason 
to write AE in fl35|) as a function of the Wki{t n+ i) rather than as a function of the 
Awki = w H {t n +i) - w H {tn), as was done in (|3|). 

4-1. Storage of one pattern 

Let us consider at the n-th learning step, i.e., at time t n , the storage of one pattern 
£ in a network with connections given by Wij(t n ). In case of a network with changing 
and non-changing weights as introduced in section 0, the expression for the change of 
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energy is, up to second order in the changes of the synaptic weights, given by (p5[). Note 
that a minimization of the one condition (|35| ) under the constraint induced by the fixed 
point equation (|l^) implies a minimization of the N 2 — M changes Aw?j(t n ), since a 
sum of positive terms is minimal if and only if each term is minimal; recall that M is 
the number of synapses with constant weights Wij. 

For the storage of one single pattern £, one may rewrite the fixed point equations 
(|i~2l) in the form 

gi (w l3 (t n+1 )) = , (i = l,...,N), (36) 
where 

9i{Wij{t n+ i)) = K(2£i - 1) - W ij( t n)^j ~ WtfCWOfj + i ■ ( 37 ) 

The method of Lagrange multipliers tells that one finds the extrema of (|35|) subject to 
the auxiliary conditions fl36|) from the iV 2 — M equations 



(>AE ' EA fc . % =0, (i = l,...,N;jeV i ), (38) 



dwij(t n+1 ) j^J dw i:j (t n+1 
Upon substitution of (p5|) and (|37|) into this expression, we find the iV 2 — M relations 

Wij(t n+1 ) = Wij(t n ) + , (i — 1, . . . , N; j e Vi) . (39) 

In the method of Lagrange multipliers the number of constraints equals the number 
of Lagrange multipliers Aj. Hence, there are N Lagrange multipliers. Since the N 
multipliers Aj are unequal to zero, it follows from the iV 2 — M equations ([39|) that 
N 2 — M > N, or M < N 2 — N. We now have obtained the N + N 2 — M equations 
and (|39|) for the N + A^ 2 — M unknowns Aj and Wij{t n+ \). 

The structure of these equations happens to be such that an explicit expression for 
the Aj's can be found, and thereupon, an explicit expression for the ti>y(t n+ i)'s can be 
obtained. The procedure is as follows. 

Eliminating the Wij(t n+i ys from fl3"6] ) with the help of flHPP, leads to 

A, = [k - 7l (t„)] (26 - 1) , (40) 

where we used the property (6) 2 = 6'. Substituting this expression for Aj into (|39|) 
yields 

Wij {t n+1 ) = Wlj {t n ) + 1 [k - 7 <(* n )] (26 - 1)6' , (J e V-) , (41) 
or, equivalently [see eq. fll?Dl, 

A Wij (t n ) = - 1 [k - 7< (t„)] (26 - 1)6 , (j G 7i) , (42) 



where k is the positive parameter ( pT|) related to the basins of attraction, and where the 
7i (i = 1, . . . , N) are the stability coefficients given by @. We will refer to (f42[) by the 
name of non-local energy saving learning rule, since the denominator of (J42|) depends on 
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the input from all neurons k that are connected via changing connections to neuron i. 
The factor between square brackets 

K - 7i(t n ) = « " " 0i)(2& - 1) (43) 

depends solely upon the temporal and environmental state of the post-synaptic neuron 
i, that is, on its post-synaptic potential hi at time t n of the n-th learning step, its 
thresholds 9i, its activity ^ and a parameter k. The factor ( |4*3"D can be positive or 
negative. Therefore, the learning rule d42[)-(|43D derived here from the assumption 
of minimal energy change per learning step, happens to coincide with the particular 
Hebbian learning rule (|20|) and its anti-Hebbian counterpart ( gjj) found in section [| on 
purely intuitive grounds, grounds which were related to biological plausibility. 

We thus have shown that if biological neurons would adapt their connections 
according to the non-local energy saving learning rule (fE|), this adaptation would 
be such that the network would fulfil the fixed point equation ([12]) for a pattern 
Moreover, the learning rule ( fi~2"D guarantees that the energy needed to rebuild a neural 
network with connections Wij(t n ) to a network with connections Wij(t n+ i) is minimal. 

We conclude this section with some remarks. The energy saving learning rule is 
only applicable in those situations in which the denominator is unequal to zero. This 
can be translated into a restriction on the k G V{. It follows that with an decreasing 
number of adaptable connections there is an increasing number of patterns that cannot 
be stored with the help of the non-local energy saving learning rule. This effect will be 
absent when the local energy saving learning rule is used (see section ||). 

When we repeat the derivation of (fE|) in the spin-representation with time- 
dependent thresholds as given by (|), we find again (|2|) with £ replaced by [s + l)/2, 
i.e., 

AJ tj ex s t ( Sj + 1) , (44) 

as could be expected. If, however, the derivation of (1421) is repeated in the spin 
representation with Tj taken to be a constant, as is usually done in the spin- 
representation, one finds a result which differs from fl4"4[) , namely 

AJij oc SiSj . (45) 

This is the biologically less relevant result commonly encountered in the physical 
literature, as noticed already in section |3|: see eq. (f23|). 

4-2. Storage of p patterns 

In the previous section we saw that storage of one pattern £ can be achieved via a 
synaptic change Awij given by (0). Hence, storage of p patterns £ M (/x = 1, . . . ,p) 
might be accomplished by repeated application of the learning rule (|^). Let us 
therefore consider the following learning process. In a first interval of time, [to,ti), 
a first pattern £ x is stored via the change Awij(t ), leading to the connections Wij(ti) = 
Wij(to) + Awij(t ), j G Vi. Next, in the interval [^1,^2), pattern £ 2 is stored, etcetera. 
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Finally, pattern £ p is stored. We call this sequence of storage of p patterns a learning 
cycle. 

The energy saving learning rule is a storage prescription for a new pattern, which 
does not take into account, however, any constraint that would guarantee that a 
previously stored patterns remain stored. Thus it may occur that storage of a new 
pattern will perturb, partially or totally, the storage of an older pattern. 

In section [|, on maximal learning efficiency, we will determine a learning rule 
which does guarantee that new patterns are stored without wiping out previously stored 
patterns. However, this learning rule will turn out to be biologically unacceptable. We 
therefore proceed with the learning rule derived above. We shall derive, along the lines 
of reasoning of Diederich and Opper [|7| , but for diluted networks, an expression for the 
weights Wij of the synaptic connections after infinitely many learning cycles. It will turn 
out that, in the end, previously stored patterns are not forgotten. 

As follows from eq. (|17"D, the connections after R learning cycles are given by 



r v 

Wij{tR P ) = Wij(t ) + E E (*(m-i) P +M-i) , (j e Vi) , (46) 

m=l /J.=l 

with tjip the time after R learning cycles of p patterns. 
Substituting (H) into (||) we find 

w %3 (t Rp ) = Wij (t ) + N- 1 E , (j G Vi) , (47) 

where 

Fi(t{R-i)p+n-i) = £ [«(2tf - 1) - ( E «ta(*o)£j? 

m=l keVf 

+ E w lk (t ( m-i )P+ ,-i)e k - o 1 )]/(n- 1 E £D (48) 

kev t keVi 
is the effect on of pattern ^ after R learning cycles have been completed. From 
(|D it follows that 

-( E u>ik(to)% + E mkihR-D^-i)^ - Oi) . (49) 

In the R-th learning cycle, at time t(^_i) p+Jy __i, only the patterns £ , . . . have 
changed the weights of the network. Hence, the F? with v < \x have new values at time 
t(R-i)p+ix-i-, whereas the F? with v > \i are still identical to their values in the previous 
learning cycle, i.e., are equal to the values at time £(ij-2) P +^-i- Thus, with the help of 
([471), the weights in the right-hand side of (|49| ) can be expressed as follows in terms of 
the Ft: 

w*(t (R -i )p+ v-i) = wM+N- 1 E Ftit^^+^a+N- 1 E Ft{t {R _ 2)p+ ^ x )H .(50) 

v<H u>fj, 
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Eliminating Wik(tf R ^n p+fi _i) from ( f£9| ) with the help of ( p0| ) yields 

n- 1 E E W^ijp+M-Offtf = -iv- 1 E E W(*-^m-i)«# 

+[«- 7 f(t )](2^-l). 



(51) 



This system of linear equations can be solved for Ff 1 using the Gauss-Seidel iterative 
method. We first rewrite (51) in matrix notation. Next, we introduce a p x p matrix 
Ci, the matrix-elements of which are given by 



n- 1 E ttti ■ 

keVi 



(52) 



We might call this matrix the 'reduced correlation matrix', since it correlates and 
while taking into account, via Vj, the connectivity of the network. The reduced 
correlation matrix is closely related to the usual correlation matrix if Vi contains all 
neuron indices. We proceed by decomposing this matrix C, into matrices Li and 
Ui in such a way that Ci = Li + Ui. The matrix Li is a matrix with only non- 
zero matrix-elements on and below the diagonal and Ui is a matrix with only non- 
zero matrix-elements above the diagonal. We also introduce the vectors F{(R) ■ = 
(F l l (t (R ^ 1)p+1 ^ ) ),...,F{ > (t {R _ 1)p+p „ 1) )) and G, := ([« - 7 |(t )] (2# - 1), ...,[« - 
7f(io)](2£f — 1)). Finally, we shall denote a p x p unit matrix as I. We thus can 
rewrite (|5l|) in the form 

^ ■ Fi(R) = -Ui -Fi(R-l) + Gi. (53) 

By iteratively solving this equation for Fi(R), we find 

R-l 



F l (R) 



L7 1 ■ U, 



-l: 



/ - Li 1 ■ U + 



L7 1 ■ U 



\R-2 



Gi. 



(54) 



The symmetric matrix Ci, as defined in (0), is positive definite and symmetric. It then 
can be shown that the matrix —Li 1 ■ Ui has eigenvalues smaller than one |p2 |. As a 
consequence, we have 

-L7 1 -U]^ 1 =0, (55) 



lim 

R^oo 



and it follows that, in the limit R — *■ oo, 

-l 



converges to 



Fi oo 



-i 



Lr 1 • Ui 



Gi 



Ci 1 • G,;, 



(56) 



where Fj(oo) = lim^oo Fi(R). Substitution of ( p6| ) in (|47D and restoring the old 
notation, yields, for R — > oo 

p 

w, 



(t ) + iV- 1 E [«" tffo)] (2€f - l)^ 1 )^ , (J € 7,) 

w<i(*o) , (j e v; c ) 



^-'ij (^oo) 

where (C~ ) fJ,u is the inverse of the matrix ([52]). 
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By substituting fl57f ) into ( jl2|) it can directly be verified that the weights ( p7[ ) fulfil 



(|12|) for all // (/x = 1, . . . For p = 1 this was to be expected, since the learning rule 
(f42|) was constructed that way. For p > 1 one could, for the same reason, expect that 
(|T2|) would be verified by (|57| ) for the final pattern of the learning cycle, £ p . It is less 
transparent, however, that fl5T|) satisfies ( p72|) for all patterns 

The result ( |5TD is exact for networks with a number of vanishing connections running 
from M = to M = iV 2 — Np, i.e., valid for dilution to d — 1 — a, where a = p/N. 
The analogous calculation performed by Diederich and Opper for networks with empty 
Vf, so that Vi contains all indices, yields a result that coincides with the result obtained 
via the usual pseudo-inverse solution || ||] of eq. (|T^) . Hence, the following question may 



now arise. Can we solve the eq. (|T2|) for a neural network where Vf is not empty and, 
consequently, the method of the pseudo-inverse in its standard form is not applicable? 



The answer to this question is affirmative. In [Appendix B| we modify the method of 
the pseudo-inverse so as to be applicable to systems with changing and non-changing 
interactions. Solving eq. (0) for networks with changing and non-changing connections 
via what we have called the modified method of the pseudo- inverse, one indeed obtains 
(f>7|), as we also prove in the appendix. 

Thus we have shown that the solution that corresponds to the stepwise energetically 
most economic way to realize storage of patterns in a partially connected network, turns 
out to be identical to the one obtained via a — modified — version of the well-known 
mathematical method of the pseudo-inverse applied to the fixed point equation (|12|). 
In other words, the non-local energy saving learning rule ( ^2[ ) leads to the solution of 
the fixed point equation (|T2D, obtained via the modified method of the pseudo- inverse, 
which is based, in turn, on the reduced correlation matrix. 

We conclude this section with a few remarks. In general, the inverse of the matrix 
Cf cannot easily be found analytically. However, in the non-biological case that none 
of the weights is kept constant, all index sets Vf are empty. As a consequence one may 
use, for large N and low storage capacity a := p/N, the approximations 

N 

A^E^i = a (58) 

3=1 

N 

•V ; ^X)' = « 2 , (59) 

3=1 



Substitution of d58|) and (|59|) into (p2|), where now Vi is the set of all indices, yields 

0^ = a{\ - 0)8^ + a 2 . (60) 
For the inverse of Cf v we thus obtain from (|60|) the simple analytical expression 

(cr v r = -j^—, L a — rl • (ei) 

a{l — a) [_ ap — a + 1 
Using (|61| ) in (p7|), leads to 

M**>) = ^(*0) - TTTT 7 °—. 7 E [«" 7f(*o)] ~ 1)$ 

JSa[l — a) ap — a + 1 
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+tt^ r£[*--tf(*b)](2tf-l)tf, 1 V). (62) 

iVa(l - a) 

Equation ( |62"D is an explicit expression for the weights of a (non-biological) network 
in which all the weights, including the self-interactions wu, are present. 



Kanter and Sompolinsky used the result (57) in case i ^ j for a fully connected 
network without self- interactions || . Their ad-hoc assumption that the self-interactions 
Wu can be put equal to zero, turns out to be justified in view of our exact result (j5"7|) 
with wu(t ) = 0. 

5. A learning rule with maximal learning efficiency 

In the preceding section learning of a collection of patterns was achieved by repeated 
application of the non-local energy saving learning rule. This learning rule was not 
constructed in such a way that conservation of storage of old patterns was automatically 
guaranteed when a new pattern was stored. We now address the question whether and 
how storage of a new pattern £ p+1 can be achieved without disturbing the storage of 
the old patterns . . . , £ p . We shall refer to this type of learning as maximally efficient 
learning. 

Linkevich || treated this problem on the basis of a mathematical model, in which 
suppositions are made which cannot be true in a biological neural network. Firstly, he 
treated the thresholds Ti(t), eq. (|5|), as a vanishing constant. Moreover, his network has 
symmetric connections Wij(t) = Wji(t), whereas a biological network has non-symmetric 
connections Wij(t) ^ Wji(t). Finally, his network is fully connected, i.e., all Wij{t) ^ 0. 

We may improve and generalize the reasoning of Linkevich to obtain a maximally 
efficient learning rule for a partially connected network with non-symmetric connections. 
The calculations only hold for networks in which the thresholds are equal to the stability 
coefficients k, i.e., 9i = k, for all i, and in case the initial connections are equal to zero, 
w ij{to) — for all i and j. As a final result we arrive, in this particular case, at the 
following rule for learning with maximal learning efficiency (see |Appendix A| ) 



lev, 

From ( |63D we immediately see that, in general, Awij is not symmetric in i and 
j. However, for a network in which all connections may change we find that AiOy is 
symmetric in i and j, in accordance with the result of Linkevich. Note that the i- 
dependent factors in the numerators of and PS|) are identical, which reflects the 



fact that the new pattern £ p+1 has to obey the fixed point equation, both in the cases 
of 'stepwise minimal change in energy' (B2p and of 'stepwise maximal efficient learning' 



The learning rule with maximal learning efficiency (|63|) is of the form (p3l), a form 
which we have rejected, in section |3], on biological grounds. We therefore shall not 
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pursue any further the analysis of the learning rule with maximal learning efficiency in 
the remainder of this article. 

6. Locality of learning rules 

Up to now we did not mention an important limitation of a biological learning rule. The 
mathematical learning rule to change a weight of a network can, in principle, be local or 
non-local. The second possibility must be excluded in case the weight is associated with 
a synapse: there is no biological construction available in the brain to tell a specific 
synapse how and when to change as a function of properties of neurons with which 
it has no direct contact. The modifications must result from the local situation, i.e., 
limited to the situation spatially 'close enough' to the synapse in question, and within a 
'brief span' of time. Thus, a change Aw^ may depend only on variables local, in space 
and time, to the neurons % and j. The local variables available at the synapse between 
neurons i and j are the activities & and £j, the post-synaptic potentials hi and hj, and 
the thresholds 9i and Oj. Hence, the factors occurring in Hebb rules should depend 
on these variables only 

e ij = e ij (& >hi,9i, £j , hj ,9j) . (64) 

The energy saving learning rule ( fi^) for Awij guarantees, after repeated application, 
storage of patterns in a way which is energetically efficient. The factor between square 
brackets in the non-local learning rule ( f4*2|) fulfils the criterion of locality. However, the 
learning rule as a whole is not a local learning rule because of the factor, 

V E & ( 65 ) 

k&Vi 

which depends, because of the sum over fc's restricted to Vi, eq. (0), on the network 
connectivity, and hence, not on properties related to neurons i and j only. If we 
approximate (|65| ) by some constant, rji say, we do obtain a learning rule that is local, 

Awijitn) =iji[K- (hi(t n ) - 6i) (2& - 1)] (26 - 1)0 • (66) 

We shall refer to ( |5BD as the local energy saving learning rule. The better rji approximates 
a value dictated by (|65|) , the better this local learning rule will be with respect to its 
energetic efficiency. 



At this point it is important to note that the proof of convergence of section [O 
can be generalized, replacing everywhere the factor ( |65|) by the constant positive factor 
r]i. As a final result (j57|) is found again, provided certain restrictions on rji are satisfied. 
It then can be proved |23] that the local, biologically realizable energy saving learning 
rule yields the same final values u>jj(£oo) as the non-local energy saving learning rule. 

As noticed in section |l], the constant rji is a neuron property, the determination of 
which is outside the scope of the present article: we then would have to determine the 
coefficients Qj in the expression for (|3lD explicitly. 
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A reasonable approximation for rji can easily be obtained for a fully connected 
network where all connections may change in time. For such a network we have the 
approximation for the denominator of fl65|), which implies 

rii^(Na)- 1 , for alH = 1, ...,N. (67) 

We will use this approximation in the following section where we consider a biological 
network. 



7. Local versus non-local learning 

In this section, we will study numerically, for a biological network with dilution d, the 
local energy saving learning rule ( f)6|) as a competitor of the non-local learning rule fl42"D . 
For rji we take, quite arbitrarily, the constant flBTP. We could as well have taken 1/N 
or l/(iV(l — d)): the essentials of the behaviour of the numerical results are not very 
sensitive for the precise values of the rji. 

In order to judge the functioning of a recurrent network with respect to its ability 
to store an arbitrary collection of p patterns £ M (ju = 1, . . . , p), we take L sets of such 
collections, and label them by (m = 1,...,L), i.e., £ M,m is pattern fi of set m. 
The performance of the network with respect to the patterns from the m-th set may be 
characterized by the Np stability coefficients 7f' m (i = 1, . . . , N; n = 1, . . . ,p) defined in 
equation (^). The stability coefficients 7f' m should be positive [see eq. fllPP]. Moreover, 
we have normalized in such a way that the 7's should be close to one. Hence, the more 
7f ,m we find with values around one, the better the network will perform. 

We first define for the particular set m of p patterns the quantity: 

7 = mm i= i 5 ... i7 v (7i , • • • , li ) ■ (68) 

Hence, 7" 1 is the minimal value of all stability coefficients for a particular set m of p 
patterns. A network does not function if 7™ is negative, and functions better and better 
when 7 m becomes closer to one (with the normalization k = 1). To find a number that 
characterizes the network performance for an arbitrary set of p patterns, we average the 
minimal values 7" 1 over L arbitrarily chosen sets, 

7 = yE7 m - (69) 

Hence, 7 is the average with respect to the L sets of p patterns We therefore will 
refer to 7 as the average performance of the network. Similarly, we define the average 
energy change AE 

1 L 

AE = -J2 AE ™ , ( 70 ) 

L m=l 

where AE m is the change of energy in one learning step of the m-th set of patterns. 
Furthermore, we define the average energy change per synapse Ae, as 

Ae = AE/(N 2 - M) , (71) 



Derivation of Hebb 's rule 



21 



where M is the number of non-changing synapses. We also will study the performance 
of neural networks with varying dilution by considering the distribution of the stability 
coefficients 7f' m . By studying numerically the quantities 7 and Ae and the distribution 
of the stability coefficients 7^ ,m , we can judge the power of the (exact) non-local energy 
saving learning rule ( f£2| ) compared to the (biologically feasible) local energy saving 
learning rule d5B|)-(|57|). 

7.1. Storage of one pattern 

Performance The non-local energy saving learning rule (f42|) and its local 
approximation (|66|)-(|67D are used to store one pattern In order to compare the 
quality of the two learning rules we have plotted in figure [1] the average performance 
7 versus the dilution d of the network for both learning rules. We see that the non- 
local learning rule stores a new pattern such that 7 = 1, as could be expected since it 
has been designed that way. Moreover, we see that both the non-local and the local 
learning rules lead to positive values of 7, and, hence, lead to storage of the pattern 
The non-local learning rule, however, leads at once to 7 = 1, whereas the local learning 
rule converges to 7 = 1 only after repeated application. Hence, basins of attractions of 
the local learning rule are smaller initially [see figure |IJ. 



Use of energy Furthermore, we consider the average energy change per synapse Ae 
(fn]) for the non-local and local learning rules as a function of the number of synapses 
in a network of a fixed number of neurons. In case of a single application of an energy 
saving learning rule, it turns out that for the non-local learning rule Ae increases as the 
number of synapses decreases, while Ae is constant in case of the local learning rule. 
This favourable situation of remaining constant apparently is an unexpected positive 
effect of the approximation made when going from a non-local energy saving learning 
rule to a local energy saving learning rule. 

In case of repeated application there almost is no energy effect for the non-local 
learning rule, and a slight effect for the local learning rule: the energy need per synapse 
grows with growing dilution [see figure |J. 



7.2. Storage of p patterns 

Having studied numerically the storage of one pattern, we now turn to the storage 



of p patterns. As pointed out in section |4.2j this may be achieved through repeated 



application of the energy saving learning rule. 

Storage of one pattern (p = 1) could be achieved in such a way that, by construction, 
all 7f' m (/i = 1) were equal to one in case of the non-local learning rule: 7*'™ = 1 for 
all i and m. As a consequence, the local energy saving learning rule, which is an 
approximation to the non-local one, has the property that all r y i ' m are 'not too far away' 
from the value k — 1, i.e., they are positive. We recall that positivity of the stability 
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Figure 1. The average performance, 7, of a network of 512 neurons as a function of its dilution 
d. Dilution d = means that the network is fully interconnected (Wij ^ for all i and j), dilution 
d = 1 means that there are no connections anymore (wij = for all i and j). The one pattern 
£ is chosen arbitrarily, but such that the mean activity a — 0.2. The computations have been 
averaged over 100 different £. The error bars give the standard deviation of the averaged stability 
coefficients 7$ (i = 1, . . . , N) . The calculations are performed starting from a tabula rasa for the 
weights (wij(to) = 0) and vanishing thresholds (8i = 0). 

Figures (a),(b). In the first two figures, a comparison between the non-local energy saving learning 
rule ([42] ) (upper curves) and the local energy saving learning rule (|6^) (lower curves) after it has 
been applied one, (a), and five, (b), times. 

Figure (c). In the last figure, a comparison of the local energy saving learning rule (^6|) after it 
has been applied one (lower curve), five and ten (upper curve) times. 



coefficients r )} ,m is a sufficient criterion for a network to store what should be stored [see 
figure 0]. 

When the energy saving learning rule is used to store more than one pattern, the 
positivity of all but the last stored pattern is not guaranteed. As noted before, we must 
allow for the fact that storage of a new pattern may spoil the storage of older patterns. 
Therefore, the requirement that the minimum of all 7f' m (/i = l,...,p) should be 
positive is too strong. Forgetting thus turns out to be an inevitable consequence of 
storing new patterns, at least in the beginning. By repeating the learning procedure for 
whole sequences of patterns we can achieve that more and more 7f' m become positive, 
suggesting that more and more patterns may be definitely stored. 

In order to judge the performance of the network in case of storage of more patterns, 
we now picture the distribution of the 7f' m over the real axis. Ideally, all 7f ' m should 
be equal to k — 1. In figure ^| the distribution has been plotted for both the non-local 
and local energy saving learning rule. As one observes from figure |3|, some of the 7's 
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Figure 2. The average energy consumed per synapse Ae in one learning step, of a network of 
512 neurons as a function of its dilution d. The one pattern £ is chosen arbitrarily, but such 
that the mean activity a — 0.2. The computations have been averaged over 100 different £. The 
error bars give the standard deviation of the averaged stability coefficients 7$ (i = 1, . . . , N). The 
calculations are performed starting from a tabula rasa for the weights (Wij (to) — 0) and vanishing 
thresholds (6, t = 0). 

Figure (a). The average energy change per synapse Ae for the non-local energy saving learning 
rule after one (upper curve) and two learning steps (lower curve, coinciding with the horizontal 
axis) . 

Figure (b). The average energy change per synapse Ae for the local energy saving learning rule 
caused by the first (upper curve) , second or fifth (lower curves) time that the local energy saving 
rule (|66|)~(|67|) is used. 



have values smaller than one (and even negative) whereas others have values larger than 
one. This is due to the fact that storing in set m a pattern the 7f' m 's of the other 
patterns £ M (/i 7^ v) are not taken into account in the learning step and as a consequence 
can be enlarged or reduced in value. We have chosen to put the number of 7's with 
values outside the plotted interval in the very first and the very last interval: see, e.g., 
figure |e. 

The general conclusion is that the local energy saving learning rule, although in 
principle approximative, is an excellent competitor of the non-local one. After five 
learning cycles already the number of negative 7f' m is negligible [see figures |3|b and [3|f], 
and the distribution of the 7f' m 's are comparable. 

We finally make some observations regarding other learning rules. In view of 
(HP) the symmetric learning rule ( p3|) yields the same values of the 7's as in case 
of our asymmetric learning rule (^). Hence, in particular, the whole analysis of this 
section holds true for the symmetric learning rule as well. In other words, although the 
changes Awij in the weights Wij as given by the symmetric learning rule (|23|) are, of 
course, different from those given by our asymmetric learning rule (|20|), the convergence 
properties — studied here via the 7's — are exactly the same for the symmetric learning 
rule ( p3[ ) and our asymmetric learning rule (|20|). The 'wrong' asymmetric learning rule 
(E2|) does not work at all, as has been explained at the end of section R. 
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Figure 3. The average number of stability coefficients n 7 per interval of size 0.05, divided by 
the total number of the stability coefficients 7f' m , given by NpL, has been plotted for a neural 
network with dilution 0.6, after one or more learning cycles, for the non-local and local energy 
saving learning rules. 

The calculations have been performed for a tabula rasa network, Wij(to) = 0, of N = 128 neurons 
with vanishing thresholds (8i = 0). An average has been taken of L = 100 sets of p — 32 patterns. 
The average activity is a = 0.2. 

Figures (a-d). The average number of stability coefficients after 1, 5, 10 and 20 learning cycles in 
case of the local energy saving learning rule (|66|)-(|67|). 

Figures (e-h). The average number of stability coefficients after 1, 5, 10 and 20 learning cycles in 
case of the non-local energy saving learning rule (^) . 



8. Summary 

We have shown that two different arguments, a biological one (section and a physical 
one (section f|) lead to a Hebb rule of the same asymmetric form: compare eqs. (p0|)- 
(p|) at the one hand and eq. (^) at the other hand. A learning rule of this form is 
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never, or at least not often, used in the physical literature, which, in general, is less 
concerned with an accurate modeling of a biological network. 

The biological argument was largely based on the improbability of a change of 
connections if the pre-synaptic neuron was inactive. The physical argument was based 
on the expression ([!5|) for the energy change, not on any ad-hoc cost-function like (p7|) as 
has been done so far in the literature. The local version of the energy saving Hebb rule 
(JG|), given by eqs. (|66l)-(|67j), may be relevant for biological systems. It has been tested 
numerically in section [7|, and turns out to yield storage of patterns in a satisfactory way: 
see in particular figure ||. 
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Appendix A. Maximal efficient learning 

We shall here merely verify the maximal efficient learning rule, not derive the rule, since 
the derivation closely parallels the one of Linkevich || . In view of the special constraints 
mentioned directly above eq. (JjBD, eq. ( |T2"D reduces to 

£i%-(t)# = 2K#, (// = l,...,p). (A.l) 



Similarly, the solution (|57D of (J12I) reduces to 

o, (jeVf) 

In order to store a new pattern £ p+1 , the new weights Wij(t') have to obey the 
equations 

E = 2«er , (ji = i, • • • ,p + 1) • (a.3) 

The weights Wy{t') are related to the weights Wij(t) by 

I UeVf), {AA) 

where the Wij(t) are the connections after storage of the patterns £ , . . . , ^ p as given by 
equation (|A.2p and the Awij(t) are given by (BS). 

Inserting (TAj) with (Q) and (H) into the left-hand side of (O) Y ields 

f 2<f+E A *%(^> (/i = l,...,p) 
E = M (A.5) 

^ 2<f, (/i = P+l) 
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The right-hand side of these equations is equal to that of ( |A.3| ) if 

E A Wij {t)$ = 0, (n = l,...,p). 

jeVi 



In order to show that ( A.6|) holds, we first decompose according to 



11=1 



where a M , and have been taken such that 

E^f 1 = > (m = i,-,p). 



Using (|A.2j ) and (|A.8 ) one may prove the auxiliary relation 



E^(« +1 = o. 

fcev,- 



The proof of 



is now straightforward. First, substitution of (|5"3| ) into 



E A^wej' « e [2< +1 - E M^rie? , o* = i 



(A.6) 



(A.7) 



(A.8) 



(A.9) 

yields 
(A.10) 



Then, substituting the decomposition ([A.7]) in ( |A.10|) , and using ( |A.1| ), ( |A.8|) and ( |A.9| ) 
we see that this expression vanishes, which proves ( |A.6| ). Hence, the left-hand side of 
( |A.3| ) equals the right-hand side of ( |A.3| ) for a learning rule given by (|63|) . 



Appendix B. Modified method of the pseudo-inverse 

Consider the p sets of N linear equations 

N 



E w '- r 



(i = l,...,JV;// = l,...,p) 



(B.l) 



where Xj and af are known constants (j = 1, . . . , N; \i = 1, . . . ,p). The N 2 unknowns 
Wij are not determined as long as p < N. Let V} be the subset of indices j with the 
property that u>y is a solution of the set of equations ( p.l| ), and let the complement of 
the set Vi with respect to the total set of indices (1, . . . , N), denoted by Vf, contain the 
indices j with the property that the iuy have the pre-described constant values 6y, i.e., 

w lJ = b tl , (jevn- (B.2) 

chosen in such a way that the system of equations ( |B.1[ ) does not become incompatible. 
If the set Vf is empty, a solution of ( |B.1| ) can be obtained via the Moore- Penrose pseudo- 
inverse matrix |j[ §]. We want to obtain a solution for tOjj of (|B.1| )- (|B.2|) in case V^ c 

* In case all connections may change in time, the index sets Vi are all equal to the set of all indices. 



Then the equations (A.7) with j G Vf disappear and (A.8) amounts to the condition that the vector 
ip p+1 is orthogonal to the vectors £ M (p = 1, . . .,p). Hence, in this particular case there are p + N 



restrictions ( A.7 ) and ( A^ ) for p + N variables o M and tpj 
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is not empty, and the pseudo-inverse matrix can not be used directly. To that end, we 
construct a new set of equations, closely related to ( |B.1| )-( |B~2|) , which can be solved 
via the pseudo- inverse. We will refer to this construction as the modified method of the 
pseudo-inverse. 

We first define a new set of variables Wij according to 

Wij = Wij - bij = 1, 2, . . . , N) , (B.3) 

where are arbitrary in case j G V^. We then have 
~ / - K , {j e Vi) 

The under-determined set of pN linear equations ( |B . 1| ) ( |5T2j ) can now be rewritten 

Wijxf = af , (fj, = 1, . . . , p) , (B.5) 

where 

N 

°% = 0-i ~ H b iJ X j ■ ( B - 6 ) 

i=i 

Note that (B.5|) cannot be solved with the help of the pseudo-inverse, since the 
summation is only with respect to a restricted set of indices j 6 y ; . We therefore 
consider a new set of pN linear equations, namely 

N 

^2 VijVj =af, (// = 1, . . . ,p) . (B.7) 

3=1 

The relation of (|R7|) to ([BJl can be made clear by taking 

^ = H' ^ Vi) (B.8) 

since then the set of equations ( B.7|) for the N 2 unknowns Vij (i, j = 1, 2, . . . , N) becomes 
identical to the set of equations ( B.5|) for the unknown Wij {i = 1, . . . , N; j G Vi). The 
equation (|B.7|) can be solved with the help of the pseudo-inverse. The solution reads 



£ a^C' 1 )^, (t,j = l,2....,N), (B.9) 

(J,,V=l 



where C^ u is the usual correlation matrix [[24 



N 

C^ = Y.VkVl- (B.10) 

fc=i 



If we use ( p.8|) , the matrix C^ v becomes what we have called the 'reduced correlation 
matrix', given by 

<T (b.ii) 
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The modified correlation matrix takes into account the modifications in the usual 
correlation matrix due to the particular network architecture as dictated by the index 
set Vi. The solutions become, using ( |B.8|) , 

" "jo, (jeVf), { ] 

Hence, the solution ( |B.12|) turns out to be compatible with ( p. 4] ) for j G Vf. Putting 
now 

w ij = v ij , (ij = 1,2, ...,N), (B.13) 

we have obtained a solution for (|B.5| ), as follows by comparing ( [B.7] ) and ( |B.5| ). In this 
way we find, transforming back from Wij to with the help of ( |B.3| ), and substituting 
( |B.6| ), the final result for the solution of the under-determined set of equations ( |B.1| )- 
(ED : 

!p N 
fj,,l/=l j=l \ D - L ^) 
bij , (j G I?) , 

We recall that the b^ are arbitrary for j G V^, and prescribed for j G V^ c . Notice that 
the solution (|B.14| ) is not unique because of the arbitrary constants b^ (j G Vi). 

We want to solve (|12[) for a network with changing connections Wij if j G and 
non-changing connections if j G Vf. Applying ( B.14 ) with 

4 = & 

bij = Wij(t ) , (i,j= 1,2, ...,N) 

at = K(2g-l)+9i (B.15) 



we obtain at once (|57|) . We thus arrive at the observation that the energy saving solution 
(j57[) coincides with the solution ( p.l4j) , obtained with the help of the modified method 
of the pseudo- inverse. 
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